Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(snap): avoid refresh on package_upgrade: true and refresh.hold #5426

Merged
merged 2 commits into from
Aug 7, 2024

Conversation

blackboxsw
Copy link
Collaborator

@blackboxsw blackboxsw commented Jun 20, 2024

-- 07102024: updated proposed commit message, reverted behavior to only honor refresh.hold=forever and updated example doc for temporary snap hold.

Proposed Commit Message

    feat(snap): avoid refresh on package_upgrade: true and refresh.hold
    
    When snap refresh.hold is set to forever, an admin is saying they do
    not want generic automated refreshes of snaps performed by default.
    
    This should be an indicator to cloud-init to avoid calling snap refresh
    on such systems due to a `package_upgrade: true` present in user-data.
    
    For network-limited environments with images which have the snap package
    manager but don't want to wait and timeout on snap refresh, the following
    user-data can be provided to still allow for package_upgrade: true,
    and avoid a 20-30 second wait on snaps being unable to access certain
    snap URLs.
    
       #cloud-config
       package_upgrade: true
       snap:
         commands:
           00: snap refresh --hold=forever
    
    cloud-init now interrogates the state refresh.hold value by calling
      snap get system -d
    
    If snap refresh --hold was called in that environment to set 'forever',
    cloud-init will skip calling refresh and log the reason for skipping.
    
    We cannot honor short time-based refresh.holds because the snap
    services place a short hold in early boot anyway as systemd units
    startup.
    
    Fixes: GH-5290

Additional Context

#5290

Test Steps

Checklist

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

@blackboxsw blackboxsw force-pushed the snap-avoid-refresh-on-hold branch 3 times, most recently from 3b55abd to 3bf177d Compare June 21, 2024 02:14
@MggMuggins
Copy link

This looks good; thanks for including an example in the docs.

Copy link
Collaborator

@a-dubs a-dubs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Left one small question but not going to block on it.

package_upgrade: true
snap:
commands:
00: snap refresh --hold
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a comment or something that indicates that by omitting any duration, snap refresh --hold is how you set it to "forever"? For people more familiar with snap this might be obvious, but when i read To avoid calling snap refresh in images with snap installed, setting snap refresh.hold to forever, my instinct would be to put snap refresh --hold forever or something like that. Just a thought. Maybe it's overkill.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @a-dubs fixed the documented examples to be more clear about --hold=forever vs --hold=10m etc. I also updated the command used per @holmanb's review to use snap get system -d instead and process the resulting sparse config dict.

Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments.

cloudinit/distros/package_management/snap.py Outdated Show resolved Hide resolved
["snap", "get", "system", "refresh.hold"], rcs=[0, 1]
)

if out.strip() == "forever":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I want snap holds enabled, but not by cloud-init, how would I do that with this feature?

If I set snap refresh --hold=15m, cloud-init will still attempt the refresh.

I notice that snap returns 1 if this key isn't set. Also, I see what looks like a machine-readable flag -d, if parsing is really required.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holmanb this is a good comment, thanks! We should support time-based holds as well as forever. Just added.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to revert this element because snap daemon itself sets a refresh.hold during early boot, so cloud-init can't distinguish between the snap deamon injected refresh.hold window and something set by user-data.

@holmanb holmanb self-assigned this Jul 1, 2024
@blackboxsw blackboxsw force-pushed the snap-avoid-refresh-on-hold branch 2 times, most recently from 390ceb7 to 8ebaeb1 Compare July 8, 2024 22:16
@blackboxsw blackboxsw requested review from a-dubs and holmanb July 8, 2024 22:18
@TheRealFalcon TheRealFalcon self-assigned this Jul 9, 2024
Copy link
Collaborator

@a-dubs a-dubs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, Chad! Just left one small grammar nit but otherwise, looks good to go to me.

try:
result = subp.subp(command)
snap_hold = (
util.load_json(result.stdout).get("refresh", {}).get("hold")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I start a new lxd container, I see a refresh hold immediately upon starting the container. I think a refresh hold exists regardless if the user has specified one and just points to the next time snap will refresh.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, yes you are correct. Just saw this on jammy and oracular images. Ok, we can't approach this feature from this angle. If someone wants temporary snap hold during boot, They can add a runcmd: snap refresh --unhold to user-data. I'll update docs and avoid the time-based hold since it looks to be set during boot anyway that ties directly to the Jul 10 22:33:58.917075 test-j systemd[1]: Started Snap Daemon. getting started in journalctl.

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To Brett's original point, the behavior here doesn't feel entirely consistent, but I don't really see a better way of approaching it.

I left one inline nit, but it's not blocking.

@@ -0,0 +1,8 @@
#cloud-config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It seems a bit odd to have 2 of our 3 examples dedicated to a workaround for a single package manager on a single OS. I don't think we really need an example showing how to unhold it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me. I'll drop that example. of needed it is a corner case for sure.

When snap refresh.hold is set to forever, an admin is saying they do
not want generic automated refreshes of snaps performed by default.

This should be an indicator to cloud-init to avoid calling snap refresh
on such systems due to a `package_upgrade: true` present in user-data.

For network-limited environments with images which have the snap package
manager but don't want to wait and timeout on snap refresh, the following
user-data can be provided to still allow for package_upgrade: true,
and avoid a 20-30 second wait on snaps being unable to access certain
snap URLs.

   #cloud-config
   package_upgrade: true
   snap:
     commands:
       00: snap refresh --hold=forever

cloud-init now interrogates the state refresh.hold value by calling
  snap get system -d

If snap refresh --hold was called in that environment to set 'forever',
cloud-init will skip calling refresh and log the reason for skipping.

We cannot honor short time-based refresh.holds because the snap
services place a short hold in early boot anyway as systemd units
startup.

Fixes: canonicalGH-5290
@blackboxsw blackboxsw added this to the cloud-init-24.3 milestone Jul 25, 2024
Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on a solution to this issue @blackboxsw.

It may solve the problem for the filed bug, but the UX isn't beautiful, and the implementation isn't elegant either - it requires a pretty major layering violation, depends on snap behaviors that are not well documented, and puts the responsibility on the user to figure this out from the examples - a needle in the haystack. This solution may work - and it has the huge benefit of being far simpler than a full redesign, but I'd rather that this isn't the end goal for the UI.

I think that it would ultimately make a lot more sense for cloud-init to stop having a unified UX for all package managers and provide more tasteful ones for ones like snap and apt individually.

I think that a configuration format like this

apt:
  upgrade: true
snap:
  upgrade: true

would be far more obvious and intuitive to reason about for an end user than

package_upgrade: true
snap:
  commands:
    00: snap refresh --hold=forever

Thoughts?

package_upgrade: true
snap:
commands:
00: snap refresh --hold=forever
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is a workaround, we should include a comment explaining why someone might want this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do include that comment in data.yaml for example2 which is rendered above the example in the modules documentation
- comment: "By default,package_upgrade: truewill trigger upgrades on any installed package manager. To avoid callingsnap refreshin images with snap installed, set snap refresh.hold toforever will prevent cloud-init's snap interaction during any boot". I'd updated this comment and the general desciprtion of the module to clarify that it interacts with any installed package managers which include apt/pkg/yum/zypper/snap

@holmanb
Copy link
Member

holmanb commented Jul 31, 2024

Thoughts?

While I still think that the above would make the most sense long term, I appreciate that the alternative that I've proposed is a significant chunk of work on top of what has already been done, and I don't want to block a workaround from existing for the short term.

I'm fine with merging your proposed solution and filing an issue to redesign the UI at some later date when we have time to design a more intentional UI.

@blackboxsw
Copy link
Collaborator Author

```yaml
  upgrade: true
snap:
  upgrade: true

Thanks a lot @holmanb. While I agree with your sentiment that allowing cloud-config syntax to be more specific when enabling/disabling upgrade interaction based on specific package mangers would be cleaner, I think there are a couple of reasons I wouldn't want to go this route in cloud-init:

  1. One of cloud-config's assets is that it provides a generalized layer for configuration package_upgrade which insulates admins from accounting for the differences of package managers in different images/distributions. By providing specialized knobs for any package manager interaction in cloud-init we add complexity and verbosity of config in support of minority corner cases which forces that config-specialization back onto the admin for non-standard cases.
  2. I don't really want to see us grow package-manager specific config keys as this also provides two-ways to do something in cloud-init user-data. I'd prefer we retain a general key package_upgrade: true that meets 99% of cases, and those 1% of cases where it doesn't meet the need, folks use runcmd/bootcmd or a custom script to perform such setup/upgrade because they already know they are in a special case w/ an image that doesn't behave the way they like.
  3. Image providers have made an opinonated decision to define what package managers are installed, active and available in those images. cloud-init's responsibility should be to sensibly detect whether a package manager is availability and enabled with whatever tooling the package manager provides to make an educated decision about whether or not to interact with this package managers in general operations like package install or upgrade. In the general case where an image consumer doesn't wish to interact with the image defaults they have options to: change the user-data they provide to that image to avoid pitfalls (possibly with jinja templates with conditional sections based on distro/image type), customize their own image derivatives removing services/tools that they don't want to interact with or avoid using those type of images.

I don't want to block the idea outright, but I also don't really want to prioritize this specialization of config keys where possible unless we think it's handling a significant/frequent use case as I feel it potential dilutes the opinionated default behavior of cloud-init.

@holmanb
Copy link
Member

holmanb commented Aug 6, 2024

One of cloud-config's assets is that it provides a generalized layer for configuration package_upgrade which insulates admins from accounting for the differences of package managers in different images/distributions.

Using the same configuration across different package managers is itself a minority use case, yet this was chosen as a key design point for cloud-init's package management design.

Package managers have different semantics. Trying to force multiple package managers to share the same cloud-config keys forces us to attempt to coerce the semantics to look identical when they are clearly different (see the bug that this fixes as evidence). The current design sacrifices elegance, user control, and intuitive design in attempting to meet its goal of a common UI for different package managers. Examples:

  1. What is the package_update key supposed to mean to the snap package manager?
  2. Why do we have an abstraction layer yet require specific modules for apt, snap, yum, apk, and zypper?
  3. Why does configuring the behavior of a single package manager require using multiple configuration modules (the above mentioned ones plus package-update-upgrade-install)?
  4. The abstraction layer is leaky: it fails to meet its objectives because it requires users to define package names when using the packages key. Not only are packages named differently on different package managers, but there is also frequently no 1:1 mapping between a package on one distro and a package on another distro.

By providing specialized knobs for any package manager interaction in cloud-init we add complexity and verbosity of config in support of minority corner cases which forces that config-specialization back onto the admin for non-standard cases.

Perhaps, but how much new complexity is that, really?

A new cloud-init user that uses Ubuntu is likely to know that apt and snap are their package managers. If they try to search for the snap or apt cloud-config module in the documentation, they will be surprised to find that telling cloud-init to update snap or apt packages has nothing to do with either the apt or the snap module.

in some cases this will simplify the configuration as it is to complicate it or add verbosity:

consider a user that doesn't want snap packages upgraded but does want a ppa added and packages

package_update: true
package_upgrade: true
packages: [neovim]
snap:
  commands:
    00: snap refresh --hold=forever
apt:
  sources:
      neovim:
          source: ppa:neovim-ppa/stable

versus the hypothetical:

apt:
  update: true
  upgrade: true
  packages: [neovim]
  sources:
      neovim:
          source: ppa:neovim-ppa/stable

The second one groups elements related to apt, follows the rule of one top-level key per module, and is no more verbose than the first - much cleaner.

Compare this to the added complexity of a user that just wants both apt and snap packages upgraded.

currently:

package_update: true
package_upgrade: true

versus the hypothetical:

snap:
  upgrade: true
apt:
  update: true
  upgrade: true

It is three more lines of configuration, but under the hypothetical one, a user that understands Ubuntu packaging can read it and intuitively understand what is supposed to happen - the added verbosity actually adds value by making the expected behavior more intuitive.

Another point to consider is that most users are likely to know their distro better than they know cloud-init. Cloud-init's abstraction layer requires the user to learn a new set of semantics from the package managers that they already know. For example, what does package_upgrade: true mean on Ubuntu? Apparently that means that both snap and apt packages should be updated - but that detail does not exist in the documentation and must be inferred. Our abstraction layer adds new semantics, but fails to document them, and then requires ugly workarounds like this[1] when the semantics that cloud-init has chosen don't match user needs. Imagine trying to document the behavior of package_upgrade on Ubuntu (which we really should do, if we want to keep it this way). What would it say and where would it go? I don't see anywhere in the documentation that is an appropriate fit. Trying to assume the "correct" behavior for the user adds complexity, where a slightly more verbose configuration format would make expected behavior much more obvious and wouldn't hide anything.

I think that the point about "specialized knobs per package manager adding complexity" misses the fact that having a cloud-config that maps more cleanly to package manager behavior actually reduces complexity, because it would remove the burden of mapping different semantics of each package managers from cloud-init's code - making documentation simpler and cloud-init configuration much more intuitive since it would simply be expose behavior of the package managers which the user already knows.

I don't really want to see us grow package-manager specific config keys as this also provides two-ways to do something in cloud-init user-data.

I didn't recommend keeping the old keys[2]. Anybody that requires support for multiple distros could still accomplish the same behavior using jinja templates. Anybody that wants to use multiple distros likely needs to do this anyways due to the aforementioned package naming differences, which undermines the benefit promised by a generalization layer. Alternatively one could just include cloud-config for multiple package managers and cloud-init would interact correctly with whichever ones are installed.

I'd prefer we retain a general key package_upgrade: true that meets 99% of cases, and those 1% of cases where it doesn't meet the need, folks use runcmd/bootcmd or a custom script to perform such setup/upgrade because they already know they are in a special case w/ an image that doesn't behave the way they like.

I don't think we have real statistics to quote on which use cases are actually covered, and anyone who cares about boot speed but doesn't need snaps may want to do this.

Part of the value of cloud-config is its (mostly) declarative nature, but that value is undercut because going back to a bash script is required frequently for trivial things such as wanting apt packages upgraded but not snap.

Image providers have made an opinonated decision to define what package managers are installed, active and available in those images.

The current design is opinionated, but I wouldn't call it tasteful. I still think that we can make it better: both opinionated and tasteful.

cloud-init's responsibility should be to sensibly detect whether a package manager is availability and enabled with whatever tooling the package manager provides to make an educated decision about whether or not to interact with this package managers in general operations like package install or upgrade.

That's what it does currently, but I'm still not convinced that this is what it should do. It's not intuitive. It's not beautiful. It's not simple. It's not easily maintainable.

In the general case where an image consumer doesn't wish to interact with the image defaults they have options to: change the user-data they provide to that image to avoid pitfalls (possibly with jinja templates with conditional sections based on distro/image type), customize their own image derivatives removing services/tools that they don't want to interact with or avoid using those type of images.

I don't want to block the idea outright, but I also don't really want to prioritize this specialization of config keys where possible unless we think it's handling a significant/frequent use case as I feel it potential dilutes the opinionated default behavior of cloud-init.

I'm still fine with merging this as a short term workaround, but I also still think that we can design something better, which is why I was suggesting that we continue this conversation in a separate issue in my last comment:

I'm fine with merging your proposed solution and filing an issue to redesign the UI at some later date when we have time to design a more intentional UI.

The mere existence of this bug and workaround is yet another signal that the abstraction layer design concept is flawed and could use a re-think. Distros and package managers are different under the covers, and attempting to pretend otherwise yields code that has higher maintenance cost and more complex behavior which complicates user documentation and makes cloud-init harder to use and debug.

[1] as proposed in this PR:

package_upgrade: true
snap:
  commands:
    00: snap refresh --hold=forever

[2] they violate the "top level key per module" design

@TheRealFalcon
Copy link
Member

@holmanb

Part of the value of cloud-config is its (mostly) declarative nature, but that value is undercut because going back to a bash script is required frequently for trivial things such as wanting apt packages upgraded but not snap.

I disagree with this...quite strongly actually. In a perfect world where we can perfectly design everything, no bash scripts are needed. But we're not perfect, and it's important to have escape hatches. Does the use of a trivial runcmd indicate a gap in cloud-init? Yep! But if we had taken 5 years to perfectly design and implement cloud-init, it would have been DOA while still missing half of the functionality people actually need.

It's easy to see flaws in designs after the fact, but if I think back to when the package manager to distro relationship was 1:1 and all package managers essentially performed the same operations, the current design seems reasonable to me. I agree that your current design proposal is better given the introduction of snaps, but at what cost? We have the actual coding and the evitable breakage we'd introduce, adding new top-level keys, deprecating the old ones, and supporting both for a significant (5+ years ?) time-period. The end result being that package management may be slightly easier to grok for new users, and we can update apts and snaps independently. To me that comes out as a lot of work and forcing change upon existing users for little benefit.

To echo @blackboxsw :

In the general case where an image consumer doesn't wish to interact with the image defaults they have options to: change the user-data they provide to that image to avoid pitfalls (possibly with jinja templates with conditional sections based on distro/image type), customize their own image derivatives removing services/tools that they don't want to interact with or avoid using those type of images.

This workaround is to accommodate a use case where snap is installed but being intentionally blocked by a firewall. The only reason we're adding this workaround is because we changed existing behavior that we now can't change back. If this had been released to devel only, we would have marked the issue as "won't fix".

That said, I'm ok adding your design proposal as an enhancement issue, but unless something else changes in the package management landscape, I don't see it as a priority.

"top level key per module"

Has this ever been a design goal of cloud-init? Is it currently? If so, that is news to me.

@holmanb
Copy link
Member

holmanb commented Aug 6, 2024

Part of the value of cloud-config is its (mostly) declarative nature, but that value is undercut because going back to a bash script is required frequently for trivial things such as wanting apt packages upgraded but not snap.

I disagree with this...quite strongly actually.

@TheRealFalcon Which part specifically do you disagree with? I agree with every statement in your last comment, but I don't see any contradiction with mine so I don't understand the source of disagreement.

There are only two things that I have said we should do:

  1. file an issue to redesign the UI
  2. document the semantics of package_upgrade

It sounds like there is agreement on the first point, and I'm not hearing objections on the second either, nor with the design proposal.

Cloud-init has made many design compromises over the years in order to ship code and add value, and I want for us to iterate on the principles of design so that the design of future features are approached with new insight - whether that is a redesigned package management UI or something else that is deemed more important.

"top level key per module"

Has this ever been a design goal of cloud-init? Is it currently? If so, that is news to me.

Design goal? There is no stated design goal for this that I'm aware of. Many of the newer configuration modules in cloud-init were designed this way, and avoiding the top level keeps things tidier. I can write it down somewhere in the docs as a preference for future modules if you want, but I thought that it went without saying that configuration modules directly exposing their specific configurations at the global namespace is less organized and more prone to future name conflicts. What are your thoughts, is it better, worse, or indifferent for a configuration module to populate the top level with multiple keys?

There are pros and cons to what I proposed above and the way it is currently, and ultimately the project needs to provide value to users so code needs to ship. I've said multiple times that I'm fine with doing exactly that - lets ship this code.

Let me know if I should file the issue.

@TheRealFalcon
Copy link
Member

Which part specifically do you disagree with?

In that particular line quote, that the value of cloud-config is undercut by bash scripts. I think it's actually the opposite. The bash scripts are the escape hatches that people need.

But to the larger point, your previous large post argues that the current solution is subpar and what the ideal solution should look like. While I agree on those points, I think the current solution is good enough that I don't think it's worth the burden to give it any kind of priority. Perhaps I misread your intent, but given how much was said, it seemed you were arguing that this is a task worth doing sooner than "when convenient", which historically for this project has meant basically never.

There are only two things that I have said we should do

I'm ok with both of these as long as we're on the same page about priority.

I thought that it went without saying that configuration modules directly exposing their specific configurations at the global namespace is less organized and more prone to future name conflicts. What are your thoughts, is it better, worse, or indifferent for a configuration module to populate the top level with multiple keys?

Yeah, I agree that is bad. The wording of "rule of one top-level key per module" came off as stricter to me than you probably intended. I was thinking that in your ideal cloud-config, every module has exactly one corresponding top-level key and vice versa. If that's not what you meant then we're probably on the same page.

@holmanb
Copy link
Member

holmanb commented Aug 7, 2024

Which part specifically do you disagree with?

In that particular line quote, that the value of cloud-config is undercut by bash scripts.

I think that maybe you misunderstood what I was saying. I didn't say that the value of cloud-config is undercut by bash scripts. I said that the value of cloud-config is undercut because going back to a bash script is required frequently for trivial things i.e. cloud-config doesn't implement basic features and so people are forced to use bash scripts for these basic features - which might make one wonder why they should bother with cloud-config at all.

The bash scripts are the escape hatches that people need.

100% agreed. We need them, because basic features in cloud-config are missing. We need them, which decreases the value of cloud-config because it forces reliance on a secondary (non-imperative) paradigm. Cloud-config would be more valuable if the user wasn't forced to use bash to work around missing features.

Perhaps I misread your intent, but given how much was said, it seemed you were arguing that this is a task worth doing sooner than "when convenient", which historically for this project has meant basically never.

I didn't mean sooner than when convenient. @blackboxsw had already mentioned priority and I hadn't objected - I just wanted to illustrate the shortcomings of the current design since a defense for it had been presented.

There are only two things that I have said we should do

I'm ok with both of these as long as we're on the same page about priority.

+1

I thought that it went without saying that configuration modules directly exposing their specific configurations at the global namespace is less organized and more prone to future name conflicts. What are your thoughts, is it better, worse, or indifferent for a configuration module to populate the top level with multiple keys?

Yeah, I agree that is bad. The wording of "rule of one top-level key per module" came off as stricter to me than you probably intended. I was thinking that in your ideal cloud-config, every module has exactly one corresponding top-level key and vice versa. If that's not what you meant then we're probably on the same page.

I didn't mean to imply that multiple top level keys should never happen in the same module. I just think that it's a cleaner design and is how I would recommend design future modules unless there is a good reason not to.

@blackboxsw
Copy link
Collaborator Author

Using the same configuration across different package managers is itself a minority use case, yet this was chosen as a key design point for cloud-init's package management design.

Package managers have different semantics. Trying to force multiple package managers to share the same cloud-config keys forces us to attempt to coerce the semantics to look identical when they are clearly different (see the bug that this fixes as evidence). The current design sacrifices elegance, user control, and intuitive design in attempting to meet its goal of a common UI for different package managers. Examples:

I agree that defining a common cloud-config key for various package managers does sacrifice user control by providing a more generic/opinionated approach to package manager interactions.

1. What is the `package_update` key supposed to mean to the snap package manager?

Good point, though our opinionated general design involves making decisions for each package manager as to what the default should mean for that package mgmt backend. In this case, our package update for snap NOOPs because default behavior of snap as a package manager is to refresh those packages per any --hold requests or configuration. For apt/yum/zypper it means to invoke the update operation on the package manager because the operation is appropriate to those package managers to refresh local cache of package availability and metadata. But, on snap automatically performs and update operation during refresh windows, so that operation should not make sense given the unique package manager behavior.

2. Why do we have an abstraction layer yet require specific modules for [apt](https://docs.cloud-init.io/en/latest/reference/modules.html#apt-configure), [snap](https://docs.cloud-init.io/en/latest/reference/modules.html#snap), [yum](https://docs.cloud-init.io/en/latest/reference/modules.html#yum-add-repo), [apk](https://docs.cloud-init.io/en/latest/reference/modules.html#apk-configure), and [zypper](https://docs.cloud-init.io/en/latest/reference/modules.html#zypper-add-repo)?

We have separate abstractions for apk repos/apt repos/yum repos because some of the configuration options exposed are a 1:1 replacement for repo-specific configuration syntax, publication streams or named keys. Given that the repository setup is tightly coupled to a repo configuration syntax, that generalization doesn't make as much sense as say, installing a named package.

3. Why does configuring the behavior of a single package manager require using multiple configuration modules (the above mentioned ones plus [package-update-upgrade-install](https://docs.cloud-init.io/en/latest/reference/modules.html#package-update-upgrade-install))?

This is not ideal that we have a workaround to offer alternatives to avoid changing behavior on stable downstreams as you've mentioned (and we all agree there). But in this particular fix case, the investment in delivering a solution that surfaces specific knobs per package manager is not worth the prioritization of that work for a corner-case where the package manager is installed, active, but disabled via external firewall rules that cloud-init has no visibility to.

4. The abstraction layer is leaky: it fails to meet its objectives because it requires users to define package names when using the `packages` key. Not only are packages named differently on different package managers, but there is also frequently no 1:1 mapping between a package on one distro and a package on another distro.

While this is true, there is only one cloud-config key that admins would need to know in order to install a named package on any distro "packages" . They may need to know that the package is named X on Alpine environments or Y on deb-based environments and Z on Rocky, but they wouldn't need to also alter apk/apt/yum/zypper keys. I understand that you are saying it doesn't buy them much, because they still have to know the specifics of the package they are installing, so why not clearly represent the package manager they know they are talking to as well.

By providing specialized knobs for any package manager interaction in cloud-init we add complexity and verbosity of config in support of minority corner cases which forces that config-specialization back onto the admin for non-standard cases.

Perhaps, but how much new complexity is that, really?

Not much new complexity in cloud-init code, just a smidge complexity in admin's having to state what their target package manager is when deciding to install/remove/upgrade packages. For packages which happen to have the same name in multiple distros they now have to represent:

apt: 
   packages: [apache2]

vs

apk:
   packages: [apache2]

vs

yum:
  packages: [apache2]

A new cloud-init user that uses Ubuntu is likely to know that apt and snap are their package managers. If they try to search for the snap or apt cloud-config module in the documentation, they will be surprised to find that telling cloud-init to update snap or apt packages has nothing to do with either the apt or the snap module.

Yes, but instead they find a generic "Package Update Upgrade Install" which applies to any supported distro. I'm even wondering if we'd maybe want to reorder that module name to "Package Install Update Upgrade" because I'd presume most searches in cloud-init would be something like "install software" "package install" "rpm install" etc.

consider a user that doesn't want snap packages upgraded but does want a ppa added and packages
...
snap:
commands:
00: snap refresh --hold=forever
apt:
sources:
neovim:
source: ppa:neovim-ppa/stable

I agree this is busy cloud-config and not ideal for this use case. But, I suggest this this an image that doesn't meet the user's need because it has snap installed and active which they ultimately want disabled.

versus the hypothetical:

apt:
  update: true
  upgrade: true
  packages: [neovim]
  sources:
      neovim:
          source: ppa:neovim-ppa/stable

The second one groups elements related to apt, follows the rule of one top-level key per module, and is no more verbose than the first - much cleaner.

Compare this to the added complexity of a user that just wants both apt and snap packages upgraded.

currently:

package_update: true
package_upgrade: true

versus the hypothetical:

snap:
  upgrade: true
apt:
  update: true
  upgrade: true

It is three more lines of configuration, but under the hypothetical one, a user that understands Ubuntu packaging can read it and intuitively understand what is supposed to happen - the added verbosity actually adds value by making the expected behavior more intuitive.

Another point to consider is that most users are likely to know their distro better than they know cloud-init. Cloud-init's abstraction layer requires the user to learn a new set of semantics from the package managers that they already know.

Agreed,yet I think this sentiment applies to using cloud-init's user-data for any operation. Admins trying to drive config changes to a fleet of VMs as they are launched will still have that learning curve of how to express that operation in cloud-config even if we are representing generalized operations that look appropriate under the scope of a package manager specific top-level key. They'll still have to learn to lay the update/upgrade/packages under a top-level apt/yum/zypper/apk key.

For example, what does package_upgrade: true mean on Ubuntu? Apparently that means that both snap and apt packages should be updated - but that detail does not exist in the documentation and must be inferred. Our abstraction layer adds new semantics, but fails to document them, and then requires ugly workarounds like this[1] when the semantics that cloud-init has chosen don't match user needs.

Yes, this is a "workaround" for what I would consider a case upstream cloud-init should not add to mainstream support yet. I think this conversation is one that warrants a separate upstream issue as you have offered so we have have a more detailed discussion in a focused setting outside the scope of this workaround.

It's worth extended discussion on this and the general goal of cloud-init cloud-config primitives. What is our goal here and how do we we balance the tension of exposing environment specific details in user-data versus generalized operations for features which cloud-config user-data supports. I can justify the differences we see in repo setup modules for yum/zypper/apt/apk formats warranting these separate keys as there are significantly separate behaviors and expressions of those config switches. For package managers, maybe we are taking too shallow a view in presuming that the majority of operations are similar enough to justify a general approach to their interaction.

Imagine trying to document the behavior of package_upgrade on Ubuntu (which we really should do, if we want to keep it this way).

+100

I don't really want to see us grow package-manager specific config keys as this also provides two-ways to do something in cloud-init user-data.

I didn't recommend keeping the old keys[2]. Anybody that requires support for multiple distros could still accomplish the same behavior using jinja templates. Anybody that wants to use multiple distros likely needs to do this anyways due to the aforementioned package naming differences, which undermines the benefit promised by a generalization layer. Alternatively one could just include cloud-config for multiple package managers and cloud-init would interact correctly with whichever ones are installed.

I meant, two ways to do things being that you have to speciffy apt: packages: [...] vs yum: packages: [...]. instead of just packages: [...]. But your point stands, people that are actually installing a package by name, ultimately already know they are talking to apt/yum/apk/zypper because they had to lookup that named pkg for their target distro. So they have already had to obtain all the knowledge necessary to know whether they are talking to apt or apk anyway.

I'd prefer we retain a general key package_upgrade: true that meets 99% of cases, and those 1% of cases where it doesn't meet the need, folks use runcmd/bootcmd or a custom script to perform such setup/upgrade because they already know they are in a special case w/ an image that doesn't behave the way they like.

I don't think we have real statistics to quote on which use cases are actually covered, and anyone who cares about boot speed but doesn't need snaps may want to do this.

Yes those are not real statistics for this case, I'm trying to say if we have a case where we think it meets the majority of users we should provide the best general approach for the environments we support. If we think that we are not accommodating enough of a significant community need, then we can prioritize that work to make sure it is better expressed in cloud-config.

Part of the value of cloud-config is its (mostly) declarative nature, but that value is undercut because going back to a bash script is required frequently for trivial things such as wanting apt packages upgraded but not snap.

Yes, but we also can't accept every feature gap that cloud-config has and craft a fully expressive solution that is inclusive of all cases. We have to bear in mind that we don't have the bandwidth to "achieve all the things". If filing an issue helps at least capture that gap for further discussion, I'm on board. But, whether it will result in a near-term solution to incorporate this feature, I am uncertain about priority and bandwith available to meet this need.

Image providers have made an opinionated decision to define what package managers are installed, active and available in those images.

The current design is opinionated, but I wouldn't call it tasteful. I still think that we can make it better: both opinionated and tasteful.

cloud-init's responsibility should be to sensibly detect whether a package manager is availability and enabled with whatever tooling the package manager provides to make an educated decision about whether or not to interact with this package managers in general operations like package install or upgrade.

That's what it does currently, but I'm still not convinced that this is what it should do. It's not intuitive. It's not beautiful. It's not simple. It's not easily maintainable.

If referring to this PR I'd agree, providing a workaround for something already introduced that we can't back out because it will break other consumers is not beautiful, simple or intuitive. So I can agree there,

What is not simple or intuitive about cloud-init having awareness enough to interact with a package manager that is installed and available if presented with an declared operation in cloud-config? If cloud-init grew flatpak support and an image contained flatpak commands, I'd want my packages: [...] requests to be performed using flatpak if it was the package manager available in my environment. I guess you are taking issue partially with the undocumented magic whereby cloud-init determines which package manager to interact with and how this general approach leaves much to be desired where folks may want more control to limit some of that package manager interaction.

I'm still fine with merging this as a short term workaround, but I also still think that we can design something better, which is why I was suggesting that we continue this conversation in a separate issue in my last comment:

Yes let's do this on a separate issue. As we are seeing there are differing perspectives here on what is considered 'best' for the user.

@holmanb holmanb self-requested a review August 7, 2024 15:33
@blackboxsw blackboxsw merged commit 0787d62 into canonical:main Aug 7, 2024
23 checks passed
@holmanb
Copy link
Member

holmanb commented Aug 7, 2024

For packages which happen to have the same name in multiple distros they now have to represent:

Yes, but again, I think that individuals using the same cloud-config on different distros is a minority use case, not a majority one. This abstraction layer benefits (barely - alternative methods to do the same thing exist as previously mentioned) a minority user group while sacrificing intuitive and elegant design, simplicity of implementation, etc.

I agree this is busy cloud-config and not ideal for this use case. But, I suggest this this an image that doesn't meet the user's need because it has snap installed and active which they ultimately want disabled.

This is behavior that a user may want to choose at runtime via cloud-config. It would be more user-friendly for cloud-config to design a more expressive interface than to expect users to build images without snap installed.

I don't really want to see us grow package-manager specific config keys as this also provides two-ways to do something in cloud-init user-data.
I meant, two ways to do things being that you have to speciffy apt: packages: [...] vs yum: packages: [...]. instead of just packages: [...].

I see what you're saying, but this isn't two ways to do something. These are two different things, after all. The costs associated with multiple implementations doesn't apply to this: it wouldn't require some sort of prioritization or conflict resolution logic (the cost of duplicate methods to code implementation/maintenance). I don't think that the costs normally induced by multiple implementations apply to the proposed design - yet the prioritization / conflict resolution logic certainly has to exist under the current abstraction.

I'd prefer we retain a general key package_upgrade: true that meets 99% of cases, and those 1% of cases where it doesn't meet the need, folks use runcmd/bootcmd or a custom script to perform such setup/upgrade because they already know they are in a special case w/ an image that doesn't behave the way they like.

I don't think we have real statistics to quote on which use cases are actually covered, and anyone who cares about boot speed but doesn't need snaps may want to do this.

Yes those are not real statistics for this case, I'm trying to say if we have a case where we think it meets the majority of users we should provide the best general approach for the environments we support. If we think that we are not accommodating enough of a significant community need, then we can prioritize that work to make sure it is better expressed in cloud-config.

Package managers are a key point of differentiation from one distro to the next. A paper-thin abstraction layer which attempts to make them all behave the same way is an exercise in busywork - and a recipe for confusion in the case of snap on Ubuntu. I don't think that very many package manager use cases are covered at all by this abstraction - outside of wanting a system to be up to date.

Part of the value of cloud-config is its (mostly) declarative nature, but that value is undercut because going back to a bash script is required frequently for trivial things such as wanting apt packages upgraded but not snap.

Yes, but we also can't accept every feature gap that cloud-config has and craft a fully expressive solution that is inclusive of all cases. We have to bear in mind that we don't have the bandwidth to "achieve all the things".

We're on the same page about that. This is just a really basic and simple feature which is much harder to solve than it should be - and the cause for this pain can be traced to the current abstraction layer UI and the snap implementation that was bolted onto it despite the semantic mismatch.

Image providers have made an opinionated decision to define what package managers are installed, active and available in those images.

The current design is opinionated, but I wouldn't call it tasteful. I still think that we can make it better: both opinionated and tasteful.

cloud-init's responsibility should be to sensibly detect whether a package manager is availability and enabled with whatever tooling the package manager provides to make an educated decision about whether or not to interact with this package managers in general operations like package install or upgrade.

That's what it does currently, but I'm still not convinced that this is what it should do. It's not intuitive. It's not beautiful. It's not simple. It's not easily maintainable.

If referring to this PR I'd agree, providing a workaround for something already introduced that we can't back out because it will break other consumers is not beautiful, simple or intuitive. So I can agree there,
What is not simple or intuitive about cloud-init having awareness enough to interact with a package manager that is installed and available if presented with an declared operation in cloud-config?

Simply installing from a list using a single package manager is simple, sure. But that's not where we are at today. We have two package managers on Ubuntu but try to use the same old user interface. Cloud-init cannot make the best decision for the user when multiple package managers are installed without extra information.

How is cloud-init supposed to do the right thing when only a single packages: [] list of packages is provided on a two-package manager system? Hint: failover from one manager to the next is not a the correct answer. All that someone would need to do to introduce malicious code to an unsuspecting user is to check for package names that have changed or been discontinued in the archive from one series to the next and register a package with the same name containing whatever malicious code they want in the snap store. If this were to happen then unsuspecting users that switch from one series to the next might end up with a bitcoin miner running and possibly never realize it.

Cloud-init deals with the single-package-list problem by making its list of packages contain not just package names but overload this optionally with the name of a package manager so that the user can specify which manager to install a package with.

packages:
- snap:  # why is this a member of the list and not an named object / dict?
  - certbot
  - [juju, --edge]
  - [lxd, --channel=5.15/stable]
- apt: [mg]

This is needlessly complex as it yet again requires the user to break out of the abstraction in order to even just tell the system to try to install a package using one package manager - the simplest of tasks.

If we had wanted to maintain backwards compatibility with this abstraction layer while also following the existing patterns in this module, then we could done this to allow users to specify which package manager they want to update and upgrade:

package_update: true
package_upgrade:
  - apt: true
  - snap: false

But yet again we see in an example like this that users need to specify a package manager-specific detail despite the abstraction layer supposedly providing value.

If cloud-init grew flatpak support and an image contained flatpak commands, I'd want my packages: [...] requests to be performed using flatpak if it was the package manager available in my environment.

I wouldn't. Security concerns aside, as a user there are knobs unique to flatpak that I would probably want exposed to me without having to switch back and forth between abstracted and dedicated interfaces. Trying to shove yet another package manager with different semantics into the same user interface would likely introduce a whole other set of issues (just like this one, which resulted from adding snap support under the same UI). I'd prefer a clean new module - free of the historical baggage and bugs of the legacy UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants